Digital Storytelling Book Generator with MIDI-to-Singing
نویسندگان
چکیده
Creating a digital storytelling book is an important knowledge source for the blinds, but it usually takes a lot of time and efforts. In order to read the books from electronic contents, automatic procedures could be incorporated into a speech synthesis system. In this paper, we give a practical description using a free software Text-to-speech (TTS) program with a MIDI-to-Singing toolkit as a digital storytelling book generator. In this case, a certain amount of emotional TTS customization can be derived by using time-pitch manipulation of the synthesized acoustic waveform. MIDI-to-Singing voices can be generated automatically with special emphasis on lyrical or storytelling-styled contents that are usually discouraged by uninteresting natures of voices synthesized from traditional Text-to-speech (TTS) programs. Rule-based approaches rely on rules that describe the behavior of the pitch frequency along time to generate time-pitch values. Pitch values fluctuate within a certain range depending on the intended emotion. This MIDI-to-Singing voice synthesis relies on mapping the pitch frequency values to the 12 semi-tonal melodic scales and extracting semi-tonic intervals for each emotional state. In the current version of the system, a user can style the synthesized voice by selecting either male or female standard voice in combination with one of the predefined 12 expressive styles: Neutral, Monotonic, Lowly-pitched, Highly-pitched, Rising-pitched, Falling-pitched, Happy, Sad, Fear, Anger, Randomly-pitched, and Melody-aligning (singing) styles using a small set of musical notes. A subjective test shows that synthetic conversations based on MIDI-to-Singing with customized styles are more preferable, natural, intelligible and enjoyable than the traditional ones. Finally, the result of digital talking recordings can be heard on the web-site for the comparisons between human speech and MIDI-to-Singing synthesized speech.
منابع مشابه
Outlines of Burcas - A simple concatenation-based MIDI-to-singing voice synthesis system
The present paper outlines a simple system (yet to be completed) for concatenation-based singing synthesis in Swedish. The system, called Burcas, takes as input a MIDI file (possibly holding multiple parts) for melody and a text file for lyrics, and it produces standard audio files as output. For the digital signal processing, the MBROLA speech generator is employed. Burcas consists of an input...
متن کاملAn On-the-Fly Mandarin Singing Voice Synthesis System
An on-the-fly Mandarin singing voice synthesis system, called SINVOIS (singing voice synthesis), is proposed in this paper. The SINVOIS system can receive the continuous speech of the lyrics of a song, and generate the singing voice immediately based on the music score information (embedded in a MIDI file) of the song. Two sub-systems are designed and embedded into the system. One is the synthe...
متن کاملUnique technological voice method (The YUBA Method) shows clear improvement in patients with cochlear implants in singing.
It is known that children with cochlear implants tend to sing off-key, monotonously, and flat. There are a few reports that it is possible to improve off-key singing mainly through instruction using the falsetto voice for people with normal hearing. We examined whether their singing skills could be improved through instruction. Eight subjects (five boys and three girls aged 10.4+/-2.4 years) wi...
متن کاملDistance Metrics and Indexing Strategies for a Digital Library of Popular Music
People identify powerfully with music: someone might say “that’s my song!” but they are unlikely to say “that’s my book!” or “that’s my picture!” A digital library of popular music therefore has the potential to be a compelling application of information retrieval technology. Such a library requires a retrieval method that is appropriate for a nontechnical audience. Experiments on “query by hum...
متن کاملPhonetic segmentation of singing voice using MIDI and parallel speech
When analyzing singing voice signal, it is required to know the boundaries of each phonetic unit in the singing voice samples. However, due to prolonged vowels in the singing voice, it is not easy to accurately align a singing voice with the phonetic sequence of its lyrics by conventional speech recognition approach. This paper proposes a solution for the phonetic annotation of the singing voic...
متن کامل